Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Owned pixel buffer for no-copy presentation #65

Merged
merged 26 commits into from
Apr 6, 2023
Merged

Conversation

ids1024
Copy link
Member

@ids1024 ids1024 commented Jan 11, 2023

WIP implementation of #40.

Currently this only has Wayland. It can be run with cargo run --no-default-features --features wayland --example winit.

We should do some testing to see what impact it actually has.

This is based on the API that will be used for no-copy presentation. But
wraps it in `set_buffer`.

This also fixes the wayland buffer code to set `self.width` and
`self.height` on resize, and set the length of the shared memory file
when the buffer is created.
Currently only for `wayland` and the `winit` example.
@ids1024
Copy link
Member Author

ids1024 commented Jan 11, 2023

Benchmarking is always subtle and easy to mess up, but comparing the time set_buffer takes against resize/buffer_mut/present, it seems like in the winit example this has about an order of magnitude decrease in the overhead softbuffer adds? (very roughly 1ms to 100µs, for instance, depending on window size).

It makes sense that this would be pretty efficient. In the winit example we shouldn't ever wait on buffer releases since we only draw when the server requests a draw, and Wayland is generally an asynchronous protocol, so while there may be a few context switches we shouldn't actually need to wait on the compositor for anything. So copying the buffer was the main overhead here.

@ids1024
Copy link
Member Author

ids1024 commented Jan 11, 2023

Added code for Windows, using BitBlt. Looks like the performance is better there, but not by as much?

I guess it would be possible to write some portable automated tests winit::Window::set_inner_size to test at a particular size and test resizing. Maybe a timer to try redrawing periodically without a resize (or rust-windowing/winit#2412 if that was implemented). And something to calculate average/min/max/etc. times for each function call and the total.

@notgull
Copy link
Member

notgull commented Jan 11, 2023

I added the X11 implementation for this. While we're rewriting the API, do we want to make these functions return Results? On X11 at least there are a few failure states that can happen during the buffering process, and I either log or panic on them right now.

src/wayland/buffer.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@ids1024
Copy link
Member Author

ids1024 commented Jan 11, 2023

I added the X11 implementation for this. While we're rewriting the API, do we want to make these functions return Results? On X11 at least there are a few failure states that can happen during the buffering process, and I either log or panic on them right now.

Yeah, the question is whether every method should return a Result, or if some are expected to never fail. Like resize is clearly infallible as implemented on Wayland now (just setting member variables). But that could fail if it actually performed resizing. Or maybe if the u32 is out of range for i32, which Wayland uses (should never happen, so maybe better to just panic if anything).

For the public API, we should generally document what blocking, panicking, and errors are expected in each call. Including platform specific concerns if relevant.

@notgull
Copy link
Member

notgull commented Jan 11, 2023

Yeah, the question is whether every method should return a Result, or if some are expected to never fail.

At the very least, present() should return a Result, because that's usually where the most failure-prone API calls happen. I'm neutral on resize() and buffer_mut(). The way I have them implemented right now, errors can happen, it's just that unless there's an underlying I/O error or another call failed, it's unlikely to happen.

Also, I added support to the web backend for this PR.

src/win32.rs Show resolved Hide resolved
@ids1024
Copy link
Member Author

ids1024 commented Jan 11, 2023

So a lot of the questions here are tied to the design of the public API:

Surface sizes

  • Should we use u32 or NonZeroU32 for window sizes?
  • resize() should possibly panic if the value is somehow too large. (Too large for an i32 on Wayland; which would be an utterly absurd size. May be limited to 16-bit on some backends.)
  • If a size of 0 is allowed, how does this behave? Including resizing from a larger size.
  • Should Surface::new take width/height, or do we initialize to 0 or 1 and expect resize to be called later?

Thread-safety

  • softbuffer::Surface likely needs to be !Send/!Sync on some backends. This should be done explicitly and not backend-dependent.
  • Is it necessary/desirable for softbuffer::Context to be !Send/!Sync too?

Error handling

  • present() should definitely return a Result. I guess resize() and buffer_mut() should be too, but whether they actually fail will depend on backend?
  • Or resize() can be non-blocking and never fail, and defer any buffer allocating/resizing to the next buffer_mut() call.
  • If we want to not panic even on things like disconnect from the display server, we need to return a Result or silently ignore errors in any call that might communicate with the display server. (A later call like present may still get an error.)

Documentation

  • We should document what errors are expected on functions returning Result. And what may cause panics, if applicable.
  • The public API should document what assumptions are required for soundness. Like buffer_mut() trusts the display server not to mutate shared memory on backends using shared memory.
  • What calls block? I guess we assume things like buffer_mut() won't block for long if the display server is well-behaved.

@notgull
Copy link
Member

notgull commented Jan 11, 2023

My opinions, in no particular order:

  • resize() should possibly panic if the value is somehow too large. (Too large for an i32 on Wayland; which would be an utterly absurd size. May be limited to 16-bit on some backends.)

We should return an error instead; the maximum size varies between backends, so this is a legitimate failure state rather than an unexpected issue.

  • present() should definitely return a Result. I guess resize() and buffer_mut() should be too, but whether they actually fail will depend on backend?

present() and resize() should return a Result. buffer_mut() shouldn't and should just panic on failure. The case where is actually panics on failure (on X11 at least) is startingly rare.

  • Or resize() can be non-blocking and never fail, and defer any buffer allocating/resizing to the next buffer_mut() call.

I don't think this is a good idea; buffer_mut() should only block in the event that the SHM buffer is still being flushed out.

  • If we want to not panic even on things like disconnect from the display server, we need to return a Result or silently ignore errors in any call that might communicate with the display server. (A later call like present may still get an error.)

Silently ignoring errors is a recipe for disaster, I disagree with this on principle.

  • Should we use u32 or NonZeroU32 for window sizes?

NonZeroU32 should be used, IMO.

  • softbuffer::Surface likely needs to be !Send/!Sync on some backends. This should be done explicitly and not backend-dependent.

Yes, we should use PhantomData<*const ()> or something to avoid surprises.

  • Is it necessary/desirable for softbuffer::Context to be !Send/!Sync too?

Yes, to future-proof it.

  • Should Surface::new take width/height, or do we initialize to 0 or 1 and expect resize to be called later?

buffer_mut() should panic if it's called before resize().

@ids1024
Copy link
Member Author

ids1024 commented Jan 12, 2023

We should return an error instead; the maximum size varies between backends, so this is a legitimate failure state rather than an unexpected issue.

If the size is simply passed along from winit we know it must already be in range for the backend. (Are there expected uses where it might pass something out of range?). But returning an error is easy enough if we're making that function return a Result anyway. This would more be a consideration for if we didn't.

buffer_mut() shouldn't and should just panic on failure. The case where is actually panics on failure (on X11 at least) is startingly rare.
buffer_mut() should only block in the event that the SHM buffer is still being flushed out.

In the Wayland backend, buffer_mut currently also allocates and resizes the buffers. But with Wayland none of that should block (or I guess it would block of the socket write if the buffer was full, until the compositor reads?). And the only fallible part of that is allocating and mapping shared memory, which should only really happen when out of file descriptors / memory / virtual memory. So panics make sense.

That could be moved to resize. But regardless, both methods need to block on buffer release. Waiting for buffer release could block on the compositor for any amount of time but probably shouldn't block for long. And wayland-client could return a DispatchError. So that needs to be turned into a Err return value, ignored, or turned into a panic.

Silently ignoring errors is a recipe for disaster

Well, specifically if the error is permanent loss of connection to the display server, it would be fine since we'd be guaranteed to get the exact same error when present is called. But yeah, ignoring other errors would be problematic.

Yes, to future-proof it.

It also lets use use Rc/RefCell/etc instead of Arc/Mutex.

buffer_mut() should panic if it's called before resize().

Hm. Hadn't thought of that option. I guess a default doesn't entirely make sense. Either that isn't what you wanted, and a panic would be clearer, or if you really want that you can just call .resize(1, 1).

Currently in the winit example I have it calling .resize() and .buffer_mut() every draw. So another option would be to combine them into a single call, statically preventing this panic.

But if we add an option for alpha and even other (potentially not 32-bit) pixel formats, that also impacts the API here.

@i509VCB
Copy link
Contributor

i509VCB commented Jan 12, 2023

We should probably have a test to ensure that the Send and Sync impls are correct.

I believe the static_assertions crate can do that

@ids1024
Copy link
Member Author

ids1024 commented Jan 13, 2023

A simple macOS implementation for this API could just store a Vec that can be written into, then turned into a CGDataProvider and CGImage. So exactly like the current version but saving a copy. Swapping IOSurface front/back buffers may be viable as a more optimal solution (And these APIs seem to exist on iOS too).

@jackpot51 I'm not sure how this would be handled in the Orbital implementation. Unlike the others it fetches the current size of the window and maps it, then iterates over y copying each line. Rather than mapping a buffer for a exactly the width/height passed. (Also does this handle synchronization in some implicit way, or is there a possible race if the window is resized between getting the size and mapping, or while mapped?)

@ids1024
Copy link
Member Author

ids1024 commented Jan 14, 2023

Oh, I guess the size of the window on orbital only changes when the client sends a call to change it.

So that makes sense but clashes somewhat with the way the API here works. Probably the size should match, but we don't want to panic when a size that doesn't match the window size is passed.

@madsmtm
Copy link
Member

madsmtm commented Jan 15, 2023

We should probably have a test to ensure that the Send and Sync impls are correct.

I believe the static_assertions crate can do that

You can't really do that, since you'd need to specify a negative bound. However, you can do:

/// ```compile_fail
/// use softbuffer::Context;
/// fn needs_send<T: Send>() {}
/// needs_send::<Context>();
/// ```
/// ```compile_fail
/// use softbuffer::Surface;
/// fn needs_send<T: Send>() {}
/// needs_send::<Surface>();
/// ```
extern "C" {}
  • Should Surface::new take width/height, or do we initialize to 0 or 1 and expect resize to be called later?

Surface::new already takes HasRawWindowHandle, can't it just figure out the initial height itself from that? (At least, I know it can on macOS).

  • softbuffer::Surface likely needs to be !Send/!Sync on some backends. This should be done explicitly and not backend-dependent.

While we're at it, Surface::new on macOS is only safe to call from the main thread, so perhaps we could enforce that as well? Though I guess I could also work around that by submitting work to the main thread queue instead.

@i509VCB
Copy link
Contributor

i509VCB commented Jan 15, 2023

assert_not_impl_all is in static_assertions and should work.

@madsmtm
Copy link
Member

madsmtm commented Jan 15, 2023

assert_not_impl_all is in static_assertions and should work.

Huh, cool!

@ids1024
Copy link
Member Author

ids1024 commented Jan 17, 2023

While we're at it, Surface::new on macOS is only safe to call from the main thread, so perhaps we could enforce that as well? Though I guess I could also work around that by submitting work to the main thread queue instead.

Perhaps that should just be documented as a requirement of the unsafe new() functions. The caller is responsible for making sure the handles are valid as long as the Context/Surface exist. It make sense that would require they are called on the right thread if that is a requirement on the platform.

It would be nice if there could be an entirely safe API rather than the unsafely and unenforced lifetimes inherent in raw-window-handle... but I'm not sure how that could work across platforms.

@ids1024
Copy link
Member Author

ids1024 commented Mar 16, 2023

There are some other things I'd like to work on in Softbuffer, but I wouldn't really want to work on any non-trivial changes until this is merged.

Copy link
Member

@kchibisov kchibisov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes about Wayland impl.

src/cg.rs Outdated Show resolved Hide resolved
src/wayland/mod.rs Show resolved Hide resolved
src/wayland/mod.rs Show resolved Hide resolved
src/wayland/mod.rs Show resolved Hide resolved
Copy link
Member

@daxpedda daxpedda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web implementation is LGTM.


Sorry to drop my 2c here a bit late to the party, but I was wondering why the original Surface::set_buffer() method was completely replaced instead of just amending buffer_mut() to Surface.

The issue is that if a user is unable to avoid not copying the buffer, for example they receive the buffer already as a Vec from somewhere else, if a platform doesn't support "no-copy" the user will be copying twice now: once from their buffer to the softbuffer buffer, then softbuffer will copy from it's buffer to the OS buffer.

This affects at least two Tier 1 platforms: X11 (sometimes) and MacOS. Additionally it affects Orbital (sometimes) and Web (in a weird way).

I propose to keep Surface::set_buffer() and maybe even go a step further, return an error in buffer_mut() if the surface can not be represented in a no-copy manner, making sure the user is aware and doesn't get the misconception that this is a no-copy operation.


This is off-topic for this PR, but originally I would have liked to introduce a new function in an extension trait for the Web target to be able to set the buffer in the correct format (RGBA vs 0RGB), avoiding the extra copying. This wouldn't play well with the missing Surface::set_buffer().

I saw in #17 that maybe there is the intention to introduce multiple Surface formats, which would solve that problem in a different, probably better, way.

{
// Map window buffer
let window_map =
unsafe { OrbitalMap::new(window_fd, window_width * window_height * 4) }
unsafe { OrbitalMap::new(self.window_fd(), window_width * window_height * 4) }
.expect("failed to map orbital window");

// Window buffer is u32 color data in 0xAABBGGRR format
Copy link
Member

@daxpedda daxpedda Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackpot51 sorry for my naivety here, but this comment suggests that the buffer taken and returned is partly reversed compared to what is expected by the rest of the library, ABGR vs 0RGB. Isn't that a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point. In my testing it seemed like the colors where the same on all platforms, but something isn't right with the documentation here, at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case the existing implementation isn't swapping components and this comment is already here, so this isn't something introduced in this PR. (Though if the format didn't match that might impact the possibility of no-copy on Orbital.)

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Show resolved Hide resolved
@ids1024
Copy link
Member Author

ids1024 commented Mar 29, 2023

I was wondering why the original Surface::set_buffer() method was completely replaced instead of just amending buffer_mut() to Surface.

Basically there are two advantages to removing it:

  1. Supporting both adds a bit of complexity to the library and all backends. Unless set_buffer is implemented using buffer_mut, which addresses any perceived "convince" but is problematic for performance, and deceptive if it results in 2 copies.
  2. If users of the library can easily use buffer_mut without just copying into it from another buffer, they should.

Arguably the crate is more intuitive with only one way to do things.

This affects at least two Tier 1 platforms: X11 (sometimes) and MacOS. Additionally it affects Orbital (sometimes) and Web (in a weird way).

Usually XShm should be available so the no-copy path can be used. If you use X over a network that is likely more of a bottleneck than the copy anyway, so that shouldn't be too much of an issue. I believe macOS should be able to support no-copy too, but that isn't implemented yet. The handling of Orbital is awkward, but generally correct use of the library should have a buffer size that matches window size.

For web... yeah, that's awkward if it can't support the same format that is used everywhere else.

I propose to keep Surface::set_buffer() and maybe even go a step further, return an error in buffer_mut() if the surface can not be represented in a no-copy manner, making sure the user is aware and doesn't get the misconception that this is a no-copy operation.

I definitely don't agree on making buffer_mut error if it isn't no-copy. Unless I'm missing something, that should be at least as efficient as a set_buffer method on all backends when the application is rendering into the buffer. And it just makes things more difficult for an application to have to handle this explicitly. If some applications will want to behave differently if no-copy is supported than if it isn't (when would that be applicable?), there could be a method returning a bool to query that.

In my opinion set_buffer doesn't seem that useful for normal application windows, since you'll want to dynamically render the window contents based on window size, etc. Though maybe for some niche applications, and for surfaces other than ordinary windows (like a drag-and-drop icon).

We could support both. It adds some complexity to the implementation, and we'll need to document to favor buffer_mut unless that could only be used by copying from another buffer.

I saw in #17 that maybe there is the intention to introduce multiple Surface formats, which would solve that problem in a different, probably better, way.

We definitely at least need a way to chose between alpha and non-alpha modes. There are a lot of formats that could be supported, depending on backend. So what is worth supporting here and how much complexity is worth adding is up for debate.

@daxpedda
Copy link
Member

Usually XShm should be available so the no-copy path can be used. If you use X over a network that is likely more of a bottleneck than the copy anyway, so that shouldn't be too much of an issue. I believe macOS should be able to support no-copy too, but that isn't implemented yet.

I wasn't aware of this. My motivation was the (potential) overhead when no-copy isn't available, so that's moot as soon as MacOS gets implemented.

I think for Web we need a different solution anyway, which we can address later.

@ids1024
Copy link
Member Author

ids1024 commented Mar 29, 2023

In particular, macOS (and iOS) has something called IOSurface which I think we could use with the right format, memory map, using a pair to swap as front/back buffers? But I have absolutely no familiarity with this API, so if anyone more familiar with Apple platforms knows anything about this, it would be good to hear how this sounds.

We can also add back set_buffer, or something like it, later if it seems necessary.

@ids1024
Copy link
Member Author

ids1024 commented Apr 4, 2023

Updated to use NonZeroU32. This would be more convenient if winit returned that instead of u32, but it's not too bad. We seem to be in agreement to leave out an API like set_buffer() for now, and change that later if it seems necessary.

@ids1024 ids1024 requested review from john01dav and kchibisov April 5, 2023 17:46
@ids1024 ids1024 mentioned this pull request Apr 5, 2023
Copy link
Member

@kchibisov kchibisov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .get() on NonZero types, instead of u32::from.

@ids1024 ids1024 requested a review from kchibisov April 5, 2023 22:38
src/cg.rs Outdated Show resolved Hide resolved
@kchibisov kchibisov merged commit a09e4cf into master Apr 6, 2023
@kchibisov kchibisov deleted the owned-buffer branch April 6, 2023 07:31
@notgull
Copy link
Member

notgull commented Apr 6, 2023

Great job, everyone!

@ids1024 ids1024 mentioned this pull request Apr 6, 2023
@ids1024 ids1024 mentioned this pull request May 13, 2023
@notgull notgull mentioned this pull request Jun 4, 2023
notgull added a commit that referenced this pull request Jun 4, 2023
* On MacOS, the contents scale is updated when set_buffer() is called, to adapt when the window is on a new screen (#68).
* **Breaking:** Split the `GraphicsContext` type into `Context` and `Surface` (#64).
* On Web, cache the document in the `Context` type (#66).
* **Breaking:** Introduce a new "owned buffer" for no-copy presentation (#65).
* Enable support for multi-threaded WASM (#77).
* Fix buffer resizing on X11 (#69).
* Add a set of functions for handling buffer damage (#99).
* Add a `fetch()` function for getting the window contents (#104).
* Bump MSRV to 1.64 (#81).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

9 participants